A Separate-and-Learn Approach to EM Learning of PCFGs

نویسندگان

  • Taisuke Sato
  • Shigeru Abe
  • Yoshitaka Kameya
  • Kiyoaki Shirai
چکیده

Wepropose a new approach to EM learning of PCFGs. We completely separate the process of EM learning from that of parsing, and for the former, we introduce a new EM algorithm called the graphical EM algorithm that runs on a new data structure called support graphs extracted from WFSTs (well formed substring tables) of various parsers. Learning experiments with PCFGs using two Japanese corpora indicate that our approach can signi cantly outperform the existing approaches using the Inside-Outside algorithm (Baker, 1979) and Stolcke's EM algorithm (Stolcke, 1995).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experiments with Spectral Learning of Latent-Variable PCFGs

Latent-variable PCFGs (L-PCFGs) are a highly successful model for natural language parsing. Recent work (Cohen et al., 2012) has introduced a spectral algorithm for parameter estimation of L-PCFGs, which—unlike the EM algorithm—is guaranteed to give consistent parameter estimates (it has PAC-style guarantees of sample complexity). This paper describes experiments using the spectral algorithm. W...

متن کامل

Designing an Optimal Pattern of General Medical Course Curriculum: an Effective Step in Enhancing How to Learn

Introduction: In today's world with a vast amount of information and knowledge, medical students should learn how to become effective physicians. Therefore, the competencies required for lifelong learning in the curriculum must be considered. The purpose of this study was to present a desirable general medical curriculum with emphasis on lifelong learning. Methods: The present study was Mixe...

متن کامل

Non-Local Modeling with a Mixture of PCFGs

While most work on parsing with PCFGs has focused on local correlations between tree configurations, we attempt to model non-local correlations using a finite mixture of PCFGs. A mixture grammar fit with the EM algorithm shows improvement over a single PCFG, both in parsing accuracy and in test data likelihood. We argue that this improvement comes from the learning of specialized grammars that ...

متن کامل

Parameter Learning of Logic Programs for Symbolic-Statistical Modeling

We propose a logical/mathematical framework for statistical parameter learning of parameterized logic programs, i.e. de nite clause programs containing probabilistic facts with a parameterized distribution. It extends the traditional least Herbrand model semantics in logic programming to distribution semantics , possible world semantics with a probability distribution which is unconditionally a...

متن کامل

Parallel EM Learning for Symbolic- Statistical Models

EM learning, i.e. parameter learning for probabilistic models using the EM algorithm, requires a larger amount of time and memory space as data size increases. One way to cope with this problem is to take advantage of the power of parallel computing. In this paper, we introduce a data-parallel algorithm for EM learning applicable to the probabilistic models represented by PRISM, a programming l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001